Search CORE

4 research outputs found

EuFRATE: European FPGA Radiation-hardened Architecture for Telecommunications

Author: Azimi Sarah
Bozzoli Ludovica
Catanese Antonio
Charaf Najdet
Fazzoletto Emilio
Goehringer Diana
Kalms Lester
King Stephen
La Greca Salvatore Gabriele
Merodio Cordinachs David
Pertuz Sergio
Rizzieri Daniele
Scarpa Eugenio
Sterpone Luca
Wulf Cornelia
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

The EuFRATE project aims to research, develop and test radiation-hardening methods for telecommunication payloads deployed for Geostationary-Earth Orbit (GEO) using Commercial-Off-The-Shelf Field Programmable Gate Arrays (FPGAs). This project is conducted by Argotec Group (Italy) with the collaboration of two partners: Politecnico di Torino (Italy) and Technische Universit¨at Dresden (Germany). The idea of the project focuses on high-performance telecommunication algorithms and the design and implementation strategies for connecting an FPGA device into a robust and efficient cluster of multi-FPGA systems. The radiation-hardening techniques currently under development are addressing both device and cluster levels, with redundant datapaths on multiple devices, comparing the results and isolating fatal errors. This paper introduces the current state of the project’s hardware design description, the composition of the FPGA cluster node, the proposed cluster topology, and the radiation hardening techniques. Intermediate stage experimental results of the FPGA communication layer performance and fault detection techniques are presented. Finally, a wide summary of the project’s impact on the scientific community is provided

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Methods and Algorithms for Efficient Programming of FPGA-based Heterogeneous Systems for Object Detection

Author: Kalms Lester
Publication venue
Publication date: 14/03/2023
Field of study

Nowadays, there is a high demand for computer vision applications in numerous application areas, such as autonomous driving or unmanned aerial vehicles. However, the application areas and scenarios are becoming increasingly complex, and their data requirements are growing. To meet these requirements, it needs increasingly powerful computing systems. FPGA-based heterogeneous systems offer an excellent solution in terms of energy efficiency, flexibility, and performance, especially in the field of computer vision. Due to complex applications and the use of FPGAs in combination with other architectures, efficient programming is becoming increasingly difficult. Thus, developers need a comprehensive framework with efficient automation, good usability, reasonable abstraction, and seamless integration of tools. It should provide an easy entry point, and reduce the effort to learn new concepts, programming languages and tools. Additionally, it needs optimized libraries for the user to focus on developing applications without getting involved with the underlying details. These should be well integrated, easy to use, and cover a wide range of possible use cases. The framework needs efficient algorithms to execute applications on heterogeneous architectures with maximum performance. These algorithms should distribute applications across various nodes with low fragmentation and communication overhead and find a near-optimal solution in a reasonable amount of time. This thesis addresses the research problem of an efficient implementation of object detection applications, their distribution across FPGA-based heterogeneous systems, and methods for automation and integration using toolchains. Within this, the three contributions are the HiFlipVX object detection library, the DECISION framework, and the APARMAP application distribution algorithm. HiFlipVX is an open-source HLS-based FPGA library optimized for performance and resource efficiency. It contains 66 highly parameterizable computer vision functions including neural networks, ideally for design space exploration. It extends the OpenVX standard for feature extraction, which is challenging due to unknown element size at design time. All functions are streaming capable to achieve maximum performance by increasing parallelism and reducing off-chip memory access. It does not require external or vendor libraries, which eases project integration, device coverage, and vendor portability, as shown for Intel. The library consumed on average 0.39% FFs and 0.32% LUTs for a set of image processing functions compared to a vendor library. A HiFlipVX implementation of the AKAZE feature detector computes between 3.56 and 4.13 times more pixels per second than the related work, while its resource consumption is comparable to optimized VHDL designs. Its neural network extension achieved a speedup of 3.23 for an AlexNet layer in comparison to a related work, while consuming 73% less on-chip memory. Furthermore, this thesis proposes an improved feature extraction implementation that achieves a repeatability of 72.57% when weighting complex cases, while the next best algorithm only achieves 62.99 %. DECISION is a framework consisting of two toolchains for the efficient programming of FPGA-based heterogeneous systems. Both integrate HiFlipVX and use a joint OpenVXbased frontend to implement computer vision applications. It abstracts the underlying hardware and algorithm details while covering a wide range of architectures and applications. The first toolchain targets x86-based systems consisting of CPUs, GPUs, and FPGAs using OpenCL (Open Computing Language). To create a heterogeneous schedule, it considers device profiles, kernel profiles and estimates, and FPGA dataflow characteristics. It manages synchronization, memory transfers and data coherence at design time. It creates a runtime optimized program which excels by its high parallelism and a low overhead. Additionally, this thesis looks at the integration of OpenCL-based libraries, automatic OpenCL kernel generation, and OpenCL kernel optimization and comparison for different architectures. The second toolchain creates an application specific and adaptive NoC-based architecture. The streaming-optimized architecture enables the reusability of vision functions by multiple applications to improve the resource efficiency while maintaining high performance. For a set of example applications, the resource consumption was more than halved, while its overhead was only 0.015% in terms of performance. APARMAP is an application distribution algorithm for partition-based and mesh-like FPGA topologies. It uses a NoC (Network-on-Chip) as communication infrastructure to connect reconfigurable regions and generate an application-specific hardware architecture. The algorithm uses load balancing techniques to find reasonable solutions within a predictable and scalable amount of time. It optimizes solutions using various heuristics, such as Simulated Annealing and Tabu Search. It uses a multithreaded grid-based approach to prevent threads from calculating the same solution and getting stuck in local minimums. Its constraints and objectives are the FPGA resource utilization, NoC bandwidth consumption, NoC hop count, and execution time of the proposed algorithm. The evaluation showed that the algorithm can deal with heterogeneous and irregular host graph topologies. The algorithm showed a good scalability in terms of computation time for an increasing number of nodes and partitions. It was able to achieve an optimal placement for a set of example graphs up to a size of 196 nodes on host graphs of up to 49 partitions. For a real application with 271 nodes and 441 edges, it was able to achieve a distribution with low resource fragmentation in an average time of 149 ms

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Supporting Utilities for Heterogeneous Embedded Image Processing Platforms (STHEM): An Overview

Author: Djupdal Asbjørn
Goehringer Diana
Jahre Magnus
Kalms Lester
Muddukrishna Ananya
Paolillo Antonio
Podlubne Ariel
Sadek Ahmad
Publication venue: Springer Verlag
Publication date: 01/01/2018
Field of study

The TULIPP project aims to simplify development of embedded vision applications with low-power and real-time requirements by providing a complete image processing system package called the TULIPP Starter Kit. To achieve this, the chosen high-performance embedded vision platform needs to be extended with performance analysis and power measurement features. The lack of such features plagues most embedded vision platforms in general and practitioners have adopted adhoc methods to circumvent the problem. In this paper, we describe four generic utilities that complement and refine the capabilities of existing platforms for embedded vision applications. Concretely, we describe a novel power measurement and analysis utility, a platform-optimized image processing library, a dynamic partial reconfiguration utility, and an utility providing support for using the real-time OS HIPPEROS within Xilinx SDSoC. Collectively, these utilities enable efficient development of image processing applications on the TULIPP hardware platform. In future work, we will evaluate the relative benefit of these utilities on key embedded image processing metrics such as frame rate and power consumption

Crossref

NORA - Norwegian Open Research Archives

TULIPP: Towards Ubiquitous Low-power Image Processing Platforms

Author: Bernard Guillaume
Christensen Flemming
Duhem Francois
Ehrenstråhle Carl
Göhringer Diana
Jahre Magnus
Kalb Tobias
Kalms Lester
Kjeldsberg Per Gunnar
Lemer Christian
Marty Fabien
Millet Philippe
Muddukrishna Ananya
Paolillo Antonio
Peterson Magnus
Pons Carlota
Rodriguez Ben
Ruf Boitumelo
Schuchert Tobias
Tchouchenkov Igor
Publication venue: IEEE
Publication date: 01/01/2016
Field of study

Many industrial domains rely on vision-based applications which require to comply with severe performance and embedded requirements. TULIPP will develop a reference platform, which consists of a hardware system, a tool chain and a real-time operating system. This platform defines implementation rules and interfaces to tackle power consumption issues while delivering high, energy efficient and guaranteed computing performance for image processing applications. Using this reference platform will enable designers to develop a complete solution at a reduced cost to meet the typical embedded systems requirements: Size, Weight and Power. Moreover, for less constrained systems which performance requirements cannot be fulfilled by one instance of the platform, the reference platform will also be scalable so that the resulting boards can be chained for higher processing power. The instance of the reference platform developed during the project will be use-case driven and split between the implementation of: a reference hardware architecture - a scalable low-power board; a low-power operating system and image processing libraries; a productivityenhancing tool chain. It will lead to three proof-of-concept demonstrators across different application domains: real-time and low-power medical image processing product prototype of surgical X-ray system (mobile c-arm); embedded image processing systems within Unmanned Aerial Vehicles (UAVs); automotive real time embedded systems for driver assistance. TULIPP will set up an ecosystem and will closely work with standardization organizations to propose new standards derived from its reference platform to the industry.© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Crossref

Fraunhofer-ePrints

NORA - Norwegian Open Research Archives